AITopics | visual learning

Collaborating Authors

visual learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing SystemsDec-24-2025, 08:36:17 GMT

Learning robotic grasps from visual observations is a promising yet challenging task. Recent research shows its great potential by preparing and learning from large-scale synthetic datasets. For the popular, 6 degree-of-freedom (6-DOF) grasp setting of parallel-jaw gripper, most of existing methods take the strategy of heuristically sampling grasp candidates and then evaluating them using learned scoring functions. This strategy is limited in terms of the conflict between sampling efficiency and coverage of optimal grasps. To this end, we propose in this work a novel, end-to-end \emph{Grasp Proposal Network (GPNet)}, to predict a diverse set of 6-DOF grasps for an unseen object observed from a single and unknown camera view. GPNet builds on a key design of grasp proposal module that defines \emph{anchors of grasp centers} at discrete but regular 3D grid corners, which is flexible to support either more precise or more diverse grasp predictions. To test GPNet, we contribute a synthetic dataset of 6-DOF object grasps; evaluation is conducted using rule-based criteria, simulation test, and real test. Comparative results show the advantage of our methods over existing ones. Notably, GPNet gains better simulation results via the specified coverage, which helps achieve a ready translation in real test.

end-to-end solution, grasp proposal network, visual learning, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Robots (0.62)

Add feedback

a9ad92a81748a31ef6f2ef68d775da46-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:37:56 GMT

curriculum, infant, learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Indiana (0.05)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.94)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

Review for NeurIPS paper: Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing SystemsJan-26-2025, 21:03:08 GMT

Additional Feedback: - The advantage of this approach over [33][34] as mentioned in Line 40 is mostly computational. However, no computational analysis is done to support this claim. Do these approaches achieve a diverse set of robust grasps when given enough time - how much time does it take. The code for these approaches is publicly available. Is there a theoretical limitation of the approach?

end-to-end solution, grasp proposal network, visual learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.40)

Add feedback

Review for NeurIPS paper: Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing SystemsJan-26-2025, 21:03:01 GMT

This paper proposes an approach to predict multiple stable 6-dof grasp parameters for standard parallel-jaw grippers from object point cloud inputs, with associated confidence values. Grasps are represented as tuples of (contact points of the 2 jaws and the pitch angle of the gripper), which motivates the new architectural choices proposed here, inspired by standard architectures in 2D object detection. While the network is trained end-to-end, it is internally decomposed in a sensible stage-wise manner. They also create a synthetic 22.6M 6-DOF grasp dataset built on ShapeNet objects using physics simulation, which upon public release, will be the largest such dataset. Finally, there are some limited transfer results that demonstrate transferability to real-world grasping with acceptable performance drop.

end-to-end solution, grasp proposal network, visual learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.44)

Add feedback

Active Gaze Behavior Boosts Self-Supervised Object Learning

Yu, Zhengyang, Aubret, Arthur, Raabe, Marcel C., Yang, Jane, Yu, Chen, Triesch, Jochen

arXiv.org Artificial IntelligenceNov-4-2024

Due to significant variations in the projection of the same object from different viewpoints, machine learning algorithms struggle to recognize the same object across various perspectives. In contrast, toddlers quickly learn to recognize objects from different viewpoints with almost no supervision. Recent works argue that toddlers develop this ability by mapping close-in-time visual inputs to similar representations while interacting with objects. High acuity vision is only available in the central visual field, which may explain why toddlers (much like adults) constantly move their gaze around during such interactions. It is unclear whether/how much toddlers curate their visual experience through these eye movements to support learning object representations. In this work, we explore whether a bio inspired visual learning model can harness toddlers' gaze behavior during a play session to develop view-invariant object recognition. Exploiting head-mounted eye tracking during dyadic play, we simulate toddlers' central visual field experience by cropping image regions centered on the gaze location. This visual stream feeds a time-based self-supervised learning algorithm. Our experiments demonstrate that toddlers' gaze strategy supports the learning of invariant object representations. Our analysis also reveals that the limited size of the central visual field where acuity is high is crucial for this. We further find that toddlers' visual experience elicits more robust representations compared to adults' mostly because toddlers look at objects they hold themselves for longer bouts. Overall, our work reveals how toddlers' gaze behavior supports self-supervised learning of view-invariant object recognition.

dataset, representation, toddler, (13 more...)

arXiv.org Artificial Intelligence

2411.01969

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

Neural Information Processing SystemsOct-10-2024, 21:10:54 GMT

end-to-end solution, grasp proposal network, visual learning, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology: Information Technology > Artificial Intelligence > Robots (0.97)

Add feedback

Visual Learning of Arithmetic Operation

Hoshen, Yedid (Hebrew University of Jerusalem) | Peleg, Shmuel (Hebrew University of Jerusalem)

AAAI ConferencesApr-19-2016

A simple Neural Network model is presented for end-to-end visual learning of arithmetic operations from pictures of numbers. The input consists of two pictures, each showing a 7-digit number. The output, also a picture, displays the number showing the result of an arithmetic operation (e.g., addition or subtraction) on the two input numbers. The concepts of a number, or of an operator, are not explicitly introduced. This indicates that addition is a simple cognitive task, which can be learned visually using a very small number of neurons. Other operations, e.g., multiplication, were not learnable using this architecture. Some tasks were not learnable end-to-end (e.g., addition with Roman numerals), but were easily learnable once broken into two separate sub-tasks: a perceptual Character Recognition and cognitive Arithmetic sub-tasks. This indicates that while some tasks may be easily learnable end-to-end, other may need to be broken into sub-tasks.

arithmetic operation, artificial intelligence, machine learning, (13 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Visual Learning of Arithmetic Operations

Hoshen, Yedid, Peleg, Shmuel

arXiv.org Artificial IntelligenceNov-27-2015

A simple Neural Network model is presented for end-to-end visual learning of arithmetic operations from pictures of numbers. The input consists of two pictures, each showing a 7-digit number. The output, also a picture, displays the number showing the result of an arithmetic operation (e.g., addition or subtraction) on the two input numbers. The concepts of a number, or of an operator, are not explicitly introduced. This indicates that addition is a simple cognitive task, which can be learned visually using a very small number of neurons. Other operations, e.g., multiplication, were not learnable using this architecture. Some tasks were not learnable end-to-end (e.g., addition with Roman numerals), but were easily learnable once broken into two separate sub-tasks: a perceptual \textit{Character Recognition} and cognitive \textit{Arithmetic} sub-tasks. This indicates that while some tasks may be easily learnable end-to-end, other may need to be broken into sub-tasks.

arithmetic operation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1506.02264

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Controlled Recognition Bounds for Visual Learning and Exploration

Karasev, Vasiliy, Chiuso, Alessandro, Soatto, Stefano

Neural Information Processing SystemsDec-31-2012

We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of "visual search" of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a "passive" agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an "omnipotent" agent, capable of infinite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically.

artificial intelligence, control authority, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback